Live demonstration: A 128-channel spike sorting processor featuring 0.175 μW and 0.0033 mm2 per Channel in 65-nm CMOS

نویسندگان

Seyed Mohammad Ali Zeinolabedin

Anh-Tuan Do

Dongsuk Jeon

Dennis Sylvester

Tony Tae-Hyoung Kim

چکیده

This paper presents a power and area efficient processor for real-time neural spike-sorting. We propose a robust spike detector (SD), a feature extractor (FE), and an improved k-means algorithm for better clustering accuracy. Furthermore, time-multiplexing architecture is used in SD for dynamic power reduction. A customized 39kb 8T SRAM is also implemented to minimize leakage and storage area. The proposed processor consumes 0.175 μW/ch with leakage of 0.03 μW/ch at 0.54 V and area of 0.0033 mm2/ch. Introduction Multi-electrode intracranial recording technology is required for many applications such as neural prosthetics and neuroscience research [1]. The first critical step in decoding the brain signals is detecting spikes and assigning them to an individual neuron source. This process is called spike sorting (Fig. 1). Traditionally, neural signals from a recording chip are transmitted to a nearby computer for sorting. However, this approach faces practical limitations due to the requisite high data rates and power consumption [1]. On-chip spike sorters (SS) exhibit much better power efficiency with shorter lag time, which is essential for real-time multi-channel neural signal processing [1-5]. SS typically consists of (1) a spike detector (SD) to detect and align spikes; (2) a feature extractor and dimensionality reduction (FE & DR) to extract information-rich features from noisy data; (3) a classifier to assign the detected spike to a neuron ID; and (4) a training engine (TE) to train the chip and store cluster means to a memory to be used by the classifier [1] (Fig. 1). This work improves clustering accuracy by enhancing detection, feature extraction, and clustering algorithms. At the same time, various advanced circuit techniques are employed to significantly reduce the dynamic and leakage power consumption. Spike Sorting Operation and Algorithm Prior to normal operation, on-chip SS must be trained to identify mean values of clusters, each representing one neuron source. Once trained, the TE writes cluster means to a memory for subsequent clustering. During normal operation, TE is turned off and features of the detected spikes are compared with cluster means and assigned to a cluster with the minimum distance. SD, FE and DR: We propose an integer coefficient detector (ICD), y [n], that offers better detection accuracy than the widely used absolute thresholding (AT) and nonlinear energy operator (NEO) approaches (Fig. 2). In fact, ICD not only filters out the noise which reduces the probability of false alarm (PFA) but also improves the detection by strengthening the signal. All detected spikes are aligned to the maximum slope for better clustering accuracy [1]. After spike detection, an integer coefficient FE, y [n], is also proposed and executed utilizing the aligned spikes. Then FE is followed by DR to reduce the number of features, which is critical in reducing SRAM size (and hence overall system power/area) needed in clustering. Extracted features provide better isolation between different clusters compared to the original data (Fig. 3(a)). Extensive simulation on 16 widely-used datasets in [2] reveals that reducing 48 features to 4 features (indexed 8, 11, 18, and 25) provides the best clustering performance using standard k-means compared to existing discrete derivative (DD) technique [1] with either 24 or 4 features (Fig. 3b). Furthermore, it reduces the required memory capacity. For instance, for a 3-neuron input signal, only 156 bit storage is required for four 13-bit features compared to 936 bits for the DD counterpart with 24 features. In addition, the SRAM size is reduced by 6× (from 234kb) when only using 4 features versus 24. Proposed Clustering Algorithm: Iterative k-means algorithm is a powerful software-based clustering algorithm, but it requires several iterations and a large memory to store the full data set. In a real-time implementation, iteration is not feasible as data continuously streams in. Furthermore, the number of clusters (k) should be user-specified; however, determining the number of clusters (neurons) for spike sorting is challenging. Therefore, the iterative k-means algorithm is not suitable for real-time hardware implementations. We propose an improved k-means algorithm (Fig. 4) in which the number of clusters (k) is not necessarily required to be provided. Instead, the approach forms a new cluster for new data if its weighted distance to the existing clusters is larger than that between the existing ones. This allows all clusters means to be adjusted and converge even if initial points are purposely assigned from the same cluster (Fig. 5). In the case when k is specified clustering accuracy further improves. Analysis results over various datasets [2] demonstrate that the proposed k-means performance is comparable to iterative k-means with 100 iterations running in MATLAB (Fig. 6) and it doesn’t require storage of the full dataset. Hardware Implementation Since the FE, DR, classifier and memory are only active when a spike is detected, they are all clock-gated to save power. SD is the main contributor to dynamic power (80%) because it is always active and employs a large number of D flip-flops (5625) to store the incoming data stream of 128 channels. In the interleaving architecture [3], all SD DFFs are clocked concurrently and thus its dynamic power increases quadratically with channel count (both frequency and load grow linearly). A time-multiplexing architecture (Fig. 7) is designed to avoid data transition of all registers by clock gating and multiplexing. Thus, its dynamic power is reduced by 74% for 128 channels (Fig. 8). After a spike is detected, SD is clock-gated and FE, DR, the classifier, and SRAM are activated to process the next 48 data points of that particular channel. After that, cluster means of the corresponding channel are fetched from the SRAM to the local registers of the classifier. The classifier then searches for a cluster with the shortest -distance to the new spike and finally the index of that cluster is sent out as the neuron ID. A subthreshold 8T SRAM is designed to minimize area and leakage power of the system (Fig. 9). An auto-biased bit-line keeper provides reliable sensing margin at nearand sub-threshold operation so that a single power supply can be used for both the digital core and SRAM. Measurement ResultsThe chip was fabricated in 65-nm CMOS process and occupies0.414 mm2 (Fig. 10). The functionality of the chip was verified usingdatasets in [2] and clustering accuracy is similar to Fig. 6. Theminimum operating voltage is 0.54 V while operating at 3.2 MHz(Fig. 11). The design consumes 0.175 μW/channel and 0.003mm2/ch which are 2.6× and 10× smaller than previous designs respectively.Power improvements are mainly due to the time-multiplexing SD, thereduction of the number of features in FE & DR block, and thesub-threshold SRAM. The area improvement is due to the proposedalgorithm and use of SRAM rather than a register file. Table Icompares this work with other state-of-the-art spike sorting designs.To our knowledge, this is the first real-time multi-channelspike-sorting chip that includes SD, AL, FE, DR and clustering.References[1] S. Gibson etal., IEEE Signal Process. Mag., Jan. 2012.[2] R. Q. Quiroga, etal Neural Comp., Aug. 2004.[3] R. Ollson and K. Wise, IEEE JSSC, Dec 2005.[4] M. Chae et al., ISSCC Dig. Tech. Papers, Feb. 2008. 2016 Symposium on VLSI Circuits Digest of Technical Papers978-1-5090-0635-9/16/$31.00 ©2016 IEEE32 Fig. 9. Sub-threshold 8T SRAM with BL leakagecompensation to improve sensing reliability atultra-low voltage condition.Fig. 4. Proposed hardware friendly k-means algorithm that is tolerant towrongly assigned initial clusters and does not need iterations and storage ofthe full dataset. Fig. 5. Cluster means converged to 4 clustersduring the training phase even when initial datapoints are purposely chosen from the samecluster.Fig. 1. Architecture of on-chip SS. Clustering consists of a trainingengine and a classifier which often require accessing to memory. Fig. 7. SD architecture. RBi and Clki indicate the registerbanks and gated-clocks for each channel. Arithmetic unitperforms the multiplierless calculation. Fig. 8. Power comparison between interleaved[4] and time-multiplexing architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A 130-μW, 64-Channel Spike-Sorting DSP Chip

Spike sorting is an important processing step in various neuroscientific and clinical studies. An on-chip spikesorting DSP must provide data-rate reduction while maintaining a power density much less than 800 μW/mm. Most existing designs either provide only spike detection for multi-channel processing, or they provide detection and feature extraction only for a single channel. We demonstrate a ...

متن کامل

Data Compression in Brain-Machine/Computer Interfaces Based on the Walsh-Hadamard Transform

This paper reports on the application of the Walsh-Hadamard transform (WHT) for data compression in brain-machine/brain-computer interfaces. Using the proposed technique, the amount of the neural data transmitted off the implant is compressed by a factor of at least 63 at the expense of as low as 4.66% RMS error between the signal reconstructed on the external host and the original neural signa...

متن کامل

A Multi-Channel Low-Power System-on-Chip for in Vivo Recording and Wireless Transmission of Neural Spikes

This paper reports a multi-channel neural spike recording system-on-chip with digital data compression and wireless telemetry. The circuit consists of 16 amplifiers, an analog time-division multiplexer, a single 8 bit analog-to-digital converter, a digital signal compression unit and a wireless transmitter. Although only 16 amplifiers are integrated in our current die version, the whole system ...

متن کامل

Channel thickness dependency of high-k gate dielectric based double-gate CMOS inverter

This work investigates the channel thickness dependency of high-k gate dielectric-based complementary metal-oxide-semiconductor (CMOS) inverter circuit built using a conventional double-gate metal gate oxide semiconductor field-effect transistor (DG-MOSFET). It is espied that the use of high-k dielectric as a gate oxide in n/p DG-MOSFET based CMOS inverter results in a high noise margin as well...

متن کامل

A 75-µW, 16-Channel Neural Spike-Sorting Processor With Unsupervised Clustering

Abstract We describe a neural spike-sorting processor that provides unsupervised clustering simultaneously for 16 channels. The use of a two-stage clustering algorithm, noise-tolerant distance metric, and selectively clocked high-VT register arrays makes online clustering feasible for implementation. The spike-sorting processor has a power consumption of 75μW at 270mV and an area of 2.45mm in a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Live demonstration: A 128-channel spike sorting processor featuring 0.175 μW and 0.0033 mm2 per Channel in 65-nm CMOS

نویسندگان

چکیده

منابع مشابه

A 130-μW, 64-Channel Spike-Sorting DSP Chip

Data Compression in Brain-Machine/Computer Interfaces Based on the Walsh-Hadamard Transform

A Multi-Channel Low-Power System-on-Chip for in Vivo Recording and Wireless Transmission of Neural Spikes

Channel thickness dependency of high-k gate dielectric based double-gate CMOS inverter

A 75-µW, 16-Channel Neural Spike-Sorting Processor With Unsupervised Clustering

عنوان ژورنال:

اشتراک گذاری